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Method for Encoding a Digital Signal into a Scalable Bits tr earn; 
Method for Decoding a Scalable Bits tr earn 

Background of the Invention 

5 

Recently, with the advances in computers, networking and 
communications streaming audio contents over networks such as 
the Internet, wireless local area networks, home networks and 
commercial cellular phone systems is becoming a mainstream 

10 means of audio service delivery. It is believed that with the 
progress of the broadband network infrastructures, including 
xDSL, fiber optics, and broadband wireless access, bit rates 
for these channels are quickly approaching those for delivering 
high sampling-rate, high amplitude resolution (e.g. 96 kHz, 24 

15 bit/sample) lossless audio signals. On the other hand, there 
are still application areas where high-compression digital 
audio formats, such as MPEG-4 AAC (described in [1]) are 
required. As a result, interoperable solutions that bridge the 
current channels and the rapidly emerging broadband channels 

20 are highly demanded. In addition, even when broadband channels 
are widely available and the bandwidth constraint is ultimately 
removed, a bit-rate-scalable coding system that is capable to 
produce a hierarchical bit-stream whose bit-rates can be 
dynamically changed during transmission is still highly 

25 favorable. For example, for applications where packet loss 
occurs occasionally due to accidents or resource sharing 
requirements, the current broadband waveform representations 
such as PCM (Pulse Code Modulation) and lossless coding formats 
may suffer serious distortions in a streaming situation. 

30 However, this problem can be solved if one could set packet 

priorities in the case that network resources are dynamically 
changing. Finally, a bit-rate-scalable coding system also 
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provides the server advantageous for audio streaming services, 
where graceful QoS degradation could be achieved if an 
excessive number of demands from client sites come. 

5 Previously many lossless audio coding algorithms have been 
proposed (see [2]-[8]). Most approaches rely on a prediction 
filter to remove the redundancy of the original audio signals 
while the residuals are entropy coded (as described in [5]- 
[12]). Due to the existence of the predictive filters, the bit- 

10 streams generated by these prediction based approaches are 

difficult and not efficient (see [5], [6]), if not impossible, 
to be scaled to achieve bit-rate scalability. Other approaches, 
such as described in [3], build the lossless audio coder 
through a two layer approach where the original audio signals 

15 are first coded with a lossy encoder and its residual is then 

lossless coded with a residual encoder. Although this two layer 
design provides some sort of bit-rate scalability, its 
granularity is too coarse to be appreciated by audio streaming 
applications. Audio codecs that provide the fine grain 

20 scalability on bit-rate were previously proposed in [4] and 
[18], however, unlike the system to be discussed here, those 
codecs don't provide the backward compatibility that the lossy 
bit-streams produced by both codecs are incompatible to any 
existing audio codec. 

25 

In [21], [22], [23] perceptual models are described. 

The object of the invention is to provide a method for encoding 
a digital signal in a scalable bitstream wherein backward 
30 compatibility can be maintained. 



Summary of the Invention 
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A method for encoding a digital signal into a scalable 
bitstream is provided, which method comprises: quantizing the 
digital signal, and encoding the quantized signal to form a 
5 core-layer bitstream; performing an error mapping based on the 
digital signal and the core-layer bitstream to remove 
information that has been encoded into the core-layer bitstream, 
resulting in an error signal; bit-plane coding the error signal 
based on perceptual information of the digital signal, 
10 resulting in an enhancement-layer bitstream, wherein the 

perceptual information of the digital signal is determined 
using a perceptual model; and 

-multiplexing the core-layer bitstream and the enhancement- 
layer bitstream, thereby generating the scalable bitstream. 

15 

Further, an encoder for encoding a digital signal into a 
scalable bitstream, a computer readable medium, a computer 
program element, a method for decoding a scalable bitstream 
into a digital signal, a decoder for decoding a scalable 
20 bitstream into a digital signal, a further computer readable 

medium and a further computer program element according to the 
method described above are provided. 

In one embodiment, a lossless audio codec that achieves fine 
25 grain bit-rate scalability (FGBS) with the following 
characteristics is presented: 

- Backward compatibility: a high-compression core-layer bit- 
stream, such as MPEG-4 AAC bitstream, is embedded in the 
lossless bit-stream. 
30 - Perceptually embedded lossless bit-stream: the lossless bit- 
stream can be truncated to any lossy rates without loss in the 
perceptual optimality in the reconstructed audio. 
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- Low complexity: it adds only very limited calculation upon 
AAC (binary arithmetic codec) as well as very limited memory. 

The abundant functionality provided by the presented audio 
codec suggests its capability of serving as a "universal" audio 
format to meet the various rate/quality requirements for 
different audio streaming or storage applications. For example, 
a compliant MPEG-4 AAC bit-stream which is used as the core- 
layer bitstream can be easily extracted from the bit-stream 
generated using the codec for conventional MPEG-4 AAC audio 
services. On the other hand, lossless compression is also 
provided by the codec for audio editing or storage applications 
with lossless reconstruction requirement. In audio streaming 
applications, where the FGBS is needed, the lossless bit-stream 
of the codec can be further truncated to lower bit-rates at the 
encoder/decoder or in the communication channel for any 
rate/fidelity/complexity constraints that may be arisen in 
practical systems . 

In one embodiment a method for encoding a digital signal to 
form a scalable bitstream is provided, wherein the scalable 
bitstream can be truncated at any point to produce a lower 
quality (lossy) signal when decoded by a decoder. The method 
can be used for encoding any types of digital signal, such as 
audio, image or video signals. The digital signal, which 
corresponds to a physical measured signal, may be generated by 
scanning at least a characteristic feature of a corresponding 
analog signal (for example, the luminance and chrominance 
values of a video signal, the amplitude of an analog sound 
signal, or the analog sensing signal from a sensor) . For 
example, a microphone may be used to capture an analog audio 
signal, which is then converted to a digital audio signal by 
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sampling and quantizing the captured analog audio signal- A 
video camera may be used to capture an analog video signal, 
which is then converted to a digital video signal using a 
suitable analog-to-digital converter. Alternatively, a digital 
5 camera may be used to directly capture image or video signal 
onto an image sensor (CMOS or CCD) as digital signals. 

The digital signal is quantized and coded to form a core-layer 
bitstream. The core-layer bitstream forms the minimum bit- 
10 rate/quality of the scalable bitstream. 

An enhancement-layer bitstream is used to provide an additional 
bit-rate/quality of the scalable bitstream. The enhancement- 
layer bitstream is formed according to the invention by 
15 performing an error mapping based on the transformed signal and 
the core-layer bitstream to generate an error signal. The 
purpose of performing error mapping is to remove the 
information which has already been coded into the core-layer 
bitstream. 

20 

The error signal is bit-plane coded to form the enhancement- 
layer bitstream. The bit-plane coding of the error signal is 
performed based on perceptual information, i.e. the perceived 
or perceptual importance, of the digital signal. Perceptual 

25 information used in this present invention refers to 

information which is related to the human sensory system, for 
example the human visual system (i.e. the human eye) and the 
human auditory system (i.e. the human ear) . Such perceptual 
information for the digital signal (video or audio) is obtained 

30 using a perceptual model, for example the Psychoacoustic Model 
I or II in the MPEG-1 audio (described in [21]), for audio 
signals, and the Human Visual System Model for image (described 
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in [22] ), and the Spatio-Temporal Model used in video 
(described in [23]). 

The psychoacoustic model is based on the effect that the human 
5 ear is only able to pick up sounds within a certain band of 
frequencies depending on various environmental conditions. 
Similarly, the HVM (human visual model) is based on the effect 
that the human eye is more attentive to certain motion, colors 
and contrast. 

10 

The core-layer bitstream and the enhancement-layer bitstream 
are multiplexed to form the scalable bitstream. 

The scalable bitstream can be decoded to losslessly reconstruct 
15 the digital signal. As mentioned above, the core-layer 

bitstream is an embedded bitstream which forms the minimum bit- 
rate/quality of the scalable bitstream, and the enhancement- 
layer bitstream forms the lossy to lossless portion of the 
scalable bitstream. As the enhancement-layer bitstream is 
20 perceptually bit-plane coded, the enhancement-layer bitstream 
can be truncated, in a manner such that data in the 
enhancement-layer bitstream which are less perceptually 
important are truncated first, to provide perceptual 
scalability of the scalable bitstream. In other words, the 
25 scalable bitstream can be scaled by truncating the enhancement- 
layer bitstream, so that the enhancement-layer bitstream, and 
hence the scalable bit-stream, can be perceptually optimized 
even when truncated to a lower bit-rate/quality. 

30 The method according to the invention can be used as a lossless 
encoder for digital signal, such as image, video or audio 
signal, in high bandwidth or hi-fidelity systems. When the 
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bandwidth requirement changes, the bit-rate of the bit stream 
generated by the encoder may be changed accordingly to meet the 
change in bandwidth requirements. Such a method can be 
implemented in many applications and systems such as MEG audio, 
image and video compression of JPEG 2000. 

According to an embodiment of the invention, the digital signal 
is transformed into a suitable domain before being quantized to 
form the quantized signal. The digital signal may be 
transformed within the same domain, or from one domain to 
another domain in order to better represent the digital signal, 
and thereby to allow an easy and efficient quantizing and 
coding of the digital signal to form the core-layer bit stream. 
Such domain may include, but not limited to, the time domain, 
the frequency domain, and a hybrid of the time and frequency 
domains. The transformation of the digital signal may even be 
carried out by a unitary matrix, I. 

In one embodiment, the digital signal is transformed to a 
transformed signal using an integer Modified Discrete Cosine 
Transform (intMDCT) . The intMDCT is a reversible approximation 
to the Modified Discrete Cosine Transform (MDCT) filterbank, 
which is commonly used in a MPEG-4 AAC coder. Other transforms 
for transforming the digital signal into a suitable domain for 
further processing can also be used, including, but not limited 
to, Discrete Cosine Transform, Discrete Sine Transform, Fast 
Fourier Transform and Discrete Wavelet Transform. 

When intMDCT is used to transform the digital signal to the 
transformed signal, the transformed signal (specifically the 
intMDCT coefficients which describes the transformed signal) is 
preferably normalized or scaled to approximate the output of a 
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MDCT filterbank. The normalizing of the intMDCT-transf ormed 
signal may be useful in the case when a quantizer for 
quantizing the transformed signal, for example an AAC quantizer, 
has MDCT filterbank with a global gain different from the 
5 global gain of the intMDCT filterbank. Such normalizing process 
approximates the intMDCT-transf ormed signal to the MDCT 
filterbank so that it is suitable to be directly quantized and 
coded by the quantizer to form the core-layer bitstream. 

10 For encoding an audio digital signal, the digital/transformed 
signal is preferably quantized and coded according to the MPEG 
AAC specification to generate the core-layer bitstream. This is 
because AAC is one of the most efficient perceptual audio 
coding algorithm for generating a low bit-rate but high quality 

15 audio bitstream. Therefore, the core-layer bitstream generated 
using AAC (referred as AAC bitstream) has a low bit-rate, and 
even when the scalable bitstream is truncated to the core-layer 
bitstream, the perceptual quality of the truncated bitstream is 
still high. It should be noted that other quantization and 

20 coding algorithms /methods, for example MPEG-1 Audio Layer 3, 
(MP3) or other proprietary coding/quantizing methods for 
generating the core-layer bitstream can also be used. 

The error mapping which removes information which has already 
25 been coded into the core-layer bitstream and which generates a 
residual signal (or error signal) is performed by subtracting 
the lower quantization threshold (closer to zero) of each 
quantized value of the quantized signal from the transformed 
signal. Such error mapping procedure based on quantization 
30 threshold has the advantage that the values of the residual 
signal is always positive, and the amplitude of the residual 
signal is independent of the quantization threshold. This 
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allows a low-complexity and efficient embedded coding scheme to 
be implemented. It is however also possible to subtract a 
reconstructed transformed signal from the transformed signal to 
generate the residual signal. 

5 

To determine the perceptual information of the digital signal 
for bit-plane coding of the error signal, the psychoacoustic 
model can be used as the perceptual model. The psychoacoustic 
model may be based on Psychoacoustic Model I or II used in 

10 MPEG-1 audio (as described in [21] ) , or the Psychoacoustic 

Model in MPEG-4 audio (as described in [19]). When a perceptual 
quantizer, such as the one used according to AAC, is used for 
quantizing and coding the digital/transformed signal, the 
perceptual model used in the perceptual quantizer may also be 

15 used to determine the perceptual information for bit-plane 
coding of the error signal. In other words, a separate 
perceptual model is not needed in this case to provide the 
perceptual information for bit-plane coding of the error signal. 

20 The perceptual information for bit-plane coding of the error 
signal is preferably also multiplexed with the core-layer and 
enhancement-layer bitstreams to form the scalable bitstream as 
side information. The side information can be used to 
reconstruct the error signal by a decoder. 

25 

The error signal is arranged in a plurality of bit-planes, with 
each bit-plane having a plurality of bit-plane symbols. 

In an embodiment of the invention, the arrangement or order of 
30 the bit-planes of the error signal is changed or shifted, and 
the bit-planes are subsequently scanned and coded in a 
consecutive sequential manner. The bit-planes are shifted in a 
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way such that when the bit-plane coding is performed on the 
shifted bit-planes , bit-planes comprising the more perceptually 
important bit-plane symbols are scanned and coded first. In 
this embodiment, all the bit-plane symbols in a bit-plane are 
5 coded before coding the bit-plane symbols of a subsequent 
adjacent bit-plane. 

In another embodiment of the invention, the bit-plane symbols 
of the bit-planes are scanned and coded in a sequence based on 

10 the perceptual information. In other words, not all the bit- 
plane symbols in a bit-plane are coded before coding the bit- 
plane symbols from another bit-plane. The scanning and coding 
sequence of the bit-plane symbols from the plurality of bit- 
planes is determined based on the perceptual information such 

15 that bit-plane symbols which are more perceptually important 
are coded first. 

The perceptual information of the digital signal determined by 
the perceptual model may include the first (or maximum) bit- 

20 plane M(s) (i.e. a number (index) specifying the first bit- 
plane) of the plurality of bit-planes for the bit-plane coding 
of the error signal, and/or the Just Noticeable Distortion (JND) 
level of the digital signal. It should be noted that the 
perceptual information relates to the digital signal for every 

25 different domain characteristics .(for example frequency, time, 
signal amplitude, etc) or a range of domain characteristics. 
For example, when the digital signal is transformed to the 
frequency domain, the perceptual information of the digital 
signal at every frequency or in a band of frequency (frequency 

30 band s, or more generally, domain band s) values may be 

different, indicating that the signal may be more important 
perceptually at certain frequencies. 
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In an embodiment of the invention, perceptual significance P(s) 
of the digital signal, corresponding to each frequency band s, 
is determined as the perceptual information. In this embodiment, 
5 the JND level x (s) of the digital signal corresponding to the 
bit-plane of the error signal is determined. The bit-plane 
corresponding to the JND level x (s) is then subtracted from the 
index of the first bit-plane of the plurality of bit-planes for 
the bit-plane coding of the error signal M(s) to result in the 

10 perceptual significance P(s). The perceptual significance P(s) 
can be used for controlling the shifting of the bit-planes, so 
that bit-planes comprising the more perceptually important bit- 
plane symbols are scanned and coded first. More advantageously, 
the perceptual significance P(s) can be used to control the 

15 scanning and coding sequence of the bit-plane symbols from the 
plurality of bit-planes so that bit-plane symbols which are 
more perceptually important are coded first. 

In a further embodiment of the invention, the perceptual 
20 significance P(s) is normalized to form a normalized perceptual 
significance Ps' (s) . In this embodiment, a common perceptual 
significance Ps_common of the digital signal is defined based 
on a function of the perceptual significance Ps(s). Examples of 
such function of the perceptual significance Ps(s) include the 
25 average value, the maximum value, the minimum value or a 

normalized value of the perceptual significance Ps(s). The 
common perceptual significance Ps_common is subtracted from the 
perceptual significance Ps(s) to result in the normalized 
perceptual significance Ps' (s) for each frequency band s. When 
30 the frequency band s contains at least one non-zero value 

quantized signal, the frequency band s is a significant band. 
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Otherwise, the frequency band s is an insignificant band s. For 
significant band, the value of the corresponding perceptual 
significance Ps(s) is set to the value of the common perceptual 
significance Ps^common. For insignificant band, the 
5 corresponding normalized perceptual significance Ps f (s) is 

multiplexed with the core-layer bitstream and the enhancement- 
layer bitstream for generating the scalable bitstream for 
transmission. This normalized perceptual significance Ps' (s) is 
transmitted in the scalable bitstream as side information for 
10 decoding the scalable bitstream in a decoder. 

The normalizing of the perceptual significance Ps(s) by 
defining a common perceptual significance Ps__common has the 
advantage of reducing the amount of perceptual information to 

15 be transmitted in the scalable bitstream by utilizing 

information obtained when quantizing the digital/transformed 
signal to generate the core-layer bitstream. Therefore, 
perceptual information, in particular the normalized perceptual 
significance Ps f (s), is only needed to be transmitted to the 

20 decoder side for insignificant band, as such perceptual 

information for significant band can be easily regenerated by 
the decoder. 

The index of the first (or maximum) bit-plane of the plurality 
25 of bit-planes for the bit-plane coding of the error signal M(s), 
which is part of the perceptual information of the digital 
signal, can be determined from the maximum quantization 
interval used for quantizing the digital/transformed signal. 
For significant band, the maximum quantization interval (the 
30 difference between the higher and lower quantization threshold 
corresponding to each quantized value of the quantized signal) 
is determined, and the said first bit-plane (specified by M(s)) 



WO 2005/036528 



13 



PCT/SG2004/000323 



is determined accordingly. Such maximum quantization interval 
can also be determined at the decoder side f and hence, the said 
first bit-plane (specified by M(s)) need not be transmitted as 
part of the scalable bitstream in this case (for significant 
5 band) . 

Although the encoding of a digital signal into a scalable 
bitstream is described, it shall also be understood that the 
invention also includes the decoding of the scalable bitstream 
10 into a decoded digital signal by the reverse of the method as 
described above. 



In one embodiment of the invention, a method for decoding the 
scalable bitstream into the digital signal is provided which 

15 includes de-multiplexing the scalable bitstream into a core- 
layer bitstream and an enhancement-layer bitstream, decoding 
and de-quantizing the core-layer bitstream to generate a core- 
layer signal, bit-plane decoding the enhancement-layer based on 
perceptual information of the digital signal, performing an 

20 error mapping based on the bit-plane decoded enhancement-layer 
signal and the de-quantized core-layer signal to generate an 
reconstructed transformed signal, wherein the reconstructed 
transformed signal is the digital signal. It should be noted 
that the method for decoding the scalable bitstream may be used 

25 in combination with but also separately from the method for 
encoding a digital signal into the scalable bitstream as 
described above. 



The reconstructed transformed signal may be transformed to 
30 generate the digital signal, if the digital signal is in a 
domain different from the reconstructed transformed signal. 
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The exact implementation of the decoding of the scalable 
bitstream to generate the digital signal depends on how the 
scalable bitstream is encoded by the encoder. In one example, 
the reconstructed transformed signal may be transformed using 

5 intMDCT to generate the digital signal. The core-layer 

bitstream may be decoded and de-quantized according to the MPEG 
AAC specification. The error mapping is performed by adding the 
lower quantization threshold used for de-quantizing the 
transformed signal and the bit-plane decoded enhancement-layer 

10 bitstream to generate the reconstructed transformed signal. The 
advantages and other implementations of the decoder are similar 
to the encoder, which has already been described above. 

The perceptual information of the digital signal may be 
15 obtained by de-multiplexing the scalable bit-stream, if the 

perceptual information has been multiplexed into the scalable 
bitstream as side information. Alternatively, if the core-layer 
bitstream is perceptually encoded, the perceptual information 
obtained by decoding and de-quantizing the core-layer bitstream 
20 may be used for bit-plane decoding of the enhancement-layer 
bitstream. 

In an embodiment of the invention, the enhancement-layer 
bitstream is bit-plane decoded in a consecutive sequence to 
25 generate a plurality of bit-planes comprising a plurality of 

bit-plane symbols, and the bit-planes are shifted based on the 
perceptual information of the digital signal to generate the 
bit-plane decoded enhancement-layer bitstream. 



30 In another embodiment of the invention, the enhancement-layer 
bitstream is bit-plane decoded in a sequence based on the 
perceptual information of the digital signal to generate a 
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plurality of bit-planes comprising a plurality of bit-plane 
symbols, thereby generating the bit-plane decoded enhancement- 
layer bitstream. 

5 The perceptual information of the digital signal may be at 
least one of the following: 

-the bit-plane which corresponds to the enhancement-layer 
bitstream when the bit-plane decoding of the enhancement- layer 
bitstream starts M(s) ; and 
10 -the Just Noticeable Distortion (JND) level of the digital 
signal, , wherein s corresponds to a frequency band of the 
digital signal. 

The bit-plane which corresponds to the enhancement-layer 
15 bitstream when the bit-plane decoding of the enhancement-layer 
bitstream starts M(s) is determined from the maximum 
quantization interval used for de-quantizing the core-layer 
bitstream. 

20 The second aspect of the invention not only relates to a method 
for decoding a scalable bitstream into a digital signal, but 
also includes a computer program, a computer readable medium 
and a device for implementing the said method, 

25 Detailed Description of the Invention 

Various embodiments and implementations of the invention shall 
now be described in detail with reference to the figures, 
wherein: 

30 

Figure 1 shows an encoder according to an embodiment of the 
invention. 
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Figure 2 shows a decoder according to an embodiment of the 
invention. 

5 Figure 3 illustrates a structure of a bit-plane coding process. 

Figure 4 shows an encoder according to an embodiment of the 
invention. 

10 Figure 5 shows a decoder according to an embodiment of the 
invention. 

Figure 6 shows an encoder according to an embodiment of the 
invention. 

15 

Figure 7 shows a decoder according to an embodiment of the 
invention . 

Figure 1 shows an encoder 100 according to an embodiment of the 
20 invention. 

The encoder 100 serves for generating a scalable bitstream, and 
comprises two distinct layers, namely , a core-layer which 
generates the core-layer bit-stream, and a Lossless Enhancement 
25 (LLE) layer which generates the enhancement-layer bitstream. 

The encoder comprises a domain transformer 101, a quantizer 102, 
an error mapping unit 103, a perceptual bit-plane coder 104 and 
a multiplexer 105. 

30 

In the encoder 100, the digital signal is first transf ormed by 
the domain transformer 101 to a suitable domain, such as 
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frequency domain, resulting in a transformed signal. The 
coefficients of the transformed signal are quantized by the 
quantizer 102 and coded to generate the core-layer bitstream. 
Error-mapping is performed by the error mapping unit 103 f which 

5 corresponds to the LLE layer, to remove the information from 
the coefficients of the transformed signal that has been used 
or coded in the core layer to form the core-layer bitstream. 
The resultant residual or error signal, specifically error 
coefficients, are bit-plane coded by the bit plane coder 104 to 

10 generate the embedded LLE bitstream. This embedded bit-stream 
can be further truncated to lower bit-rates at the encoder 100 
or at a corresponding decoder (as the decoder 200 shown in 
figure 2 and described below) , or in the communication channel 
to meet the rate/fidelity requirements. A perceptual model 106 

15 is used to control the bit-plane coding of the error 

coefficients, so that the bits of the error coefficients which 
are more perceptually significant are coded first. 

Finally, the resultant LLE layer bitstream is multiplexed with 
20 the core layer bitstream by the multiplexer 105 to generate the 
scalable bitstream. In addition, perceptual information for 
controlling the bit-plane coding of the error coefficients may 
also transmitted as a side information so that a corresponding 
bit-plane decoder is able to reconstruct the error coefficients 
25 in a correct order. 

When the LLE bit-stream is truncated to lower rates, the 
decoded signal would be a lossy version of the original input 
signal . 

30 

Figure 2 shows a decoder 200 according to an embodiment of the 
invention. 
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The decoder 200 decodes a scalable bitstream generated by the 
encoder 100 to reconstruct the digital signal which was encoded 
by the encoder 100. 

5 

The decoder 200 comprises a domain transformer 201 , a de- 
quantizer 202 , an error mapping unit 203 , a perceptual bit- 
plane decoder 204 and a de-multiplexer 205. 

10 The de-multiplexer 205 receives the scalable bitstream as input 
and splits the scalable bitstream into the core-layer bitstream 
and the enhancement-layer bitstream as generated by the encoder 
100. The core-layer bitstream is decoded and de-quantized by 
the de-quantizer 202 to form the core-layer signal. The 

15 enhancement- layer bitstream is perceptually bit-plane decoded 
based on the perceptual information given by a perceptual model 
206 by the perceptual bit-plane decoder 204 , and is 
subsequently error mapped by the error mapping unit 203 with 
the core-layer signal to generate an enhancement-layer signal. 

20 The enhancement-layer signal is finally transformed back to the 
domain of the digital signal by the domain transformer 201, 
resulting in an enhancement-layer transformed signal which is 
the reconstructed digital signal. 

25 

The processing carried out by the encoder 100 and the decoder 
200 is explained in detail in the following. 

The input signal is normally transformed to the frequency 
30 domain by the domain transformer 101 before it is quantized by 
the quantizer 102 (which is part of the core-layer encoder) to 
generate the core-layer bitstream. Various transform functions 
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may be used for transforming the input signal to the frequency 
domain, such as Discrete Cosine Transform (DCT) , Modified 
Discrete Cosine Transform (MDCT), integer MDCT (IntMDCT) or 
Fast Fourier Transform (FFT) . 

5 

When MPEG-4 AAC encoder is used as the core-layer encoder (for 
audio signal), MDCT is commonly used to transform the input 
audio signal to the frequency domain, as described in [1] . In 
[13], integer MDCT (IntMDCT) is proposed as a revertible 
10 approximation to the Modified Discrete Cosine Transform (MDCT) 
filterbank used with the MPEG-4 AAC encoder. A generally used 
way to implement the IntMDCT is to factor! ze the MDCT 
filterbank into a cascade of Givens rotations in the form of: 



/cos a - sin a\ 
I sin a cos a J ' 



15 which is further factorized into three lifting steps 

cos a - 1\ 



/cos a - sin a\ 11 : / 1 0 

[sin a cos a j L 1 [sin a 1 



/ cos a - 1 \ 
sin a 
0 1 



Each lifting step can be approximated by a revertible integer 
to integer mapping with the rounding to the nearest integer 
operation r:R->Z. For example, the last lifting step is 
20 approximated by: 



(xi,x 2 )^(x 1+ r( C ° s S ,^- 1 x 2 ) / x 2 j, 
which can be losslessly reverted by: 



IntMDCT is thus obtained by implementing all the Givens 
25 rotations with the revertible integer mapping as described 
above . 
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In the decoder, intMDCT can again be used by the domain 
transformer 201 to transform the enhancement- layer signal to 
the (reconstructed) digital signal. 

5 At the core layer, the coefficients c (k) of the transformed 
signal, where k - 1, ..., 1024 is the length of a frame of the 
core-layer bitstream, are quantized by the quantizer 102 and 
coded into the core-layer bitstream. In the context of an input 
audio signal, the transformed signal coefficients may be 
10 quantized according to the quantization values of a MPEG-4 ACC 
coder, a MPEG-1 Layer 3 Audio (MP3) coder or any proprietary 
audio coder. 

When the MPEG-4 ACC coder is used in conjunction with the 
15 IntMDCT, the transformed signal coefficients (also known as the 
IntMDCT coefficients), c (k) , are first normalized as 
c' (k) - a • c (k) 

to approximate the normalized outputs to the outputs of the 
MDCT filterbank. The normalized IntMDCT coefficients, c' (k) , 
20 are then quantized and coded, for example, according to an AAC 
quantizer (see [19]) which is given as follows: 

x 3/4 

+ 0.4054 



i (k) = sgn [c' (k)] 



J Ml 



4^ 2 scale f actor(s) ^ 



Here [-J denotes the flooring operation which truncates a 
floating operand to integer, i (k) is the AAC quantized 
25 coefficients and scale _ factor (s) is a scale factor of a scale- 
factor band s in which the coefficient c(k) belongs to. The 
scale factors can be adjusted adaptively by a noise shaping 
procedure so that the quantization noise is best masked by the 
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masking threshold of the human auditory system. A widely 
adopted approach for this noise shaping procedure is the nested 
quantization and coding loop as described in details in [1] . 

The quantized coefficients i (k) are noiselessly coded (in this 
example by the quantizer 102), for example, using Huffman code 
or Bit-Sliced Arithmetic Code (BSAC) as described in [17] . BSAC 
is preferred if bit-rate scalability is further required in the 
core layer bitstream. -The scale factors are differentially 
encoded, for example, by the DPCM encoding process described in 
[1], or using Huffman code. The core-layer bitstream can then 
be generated by multiplexing all the coded information 
according to the AAC bitstream syntax. 

A more comprehensive description on MPEG AAC can be found in 
[1] or in the International standard document on MPEG AAC [19] . 

It should be noted that although the mechanism of embedding the 
MPEG-4 AAC compliant bit-stream is described, it is also 
possible to use bitstreams which are compliant to other coders 
such as MPEG 1/2 Layer I, II, III (MP3) , Dolby AC3, or SONY' s 
ATRAC proprietary encoders as described in [20] . 

When quantizer 102 works according to the MPEG AAC coder, the 
de-quantizer 202 preferably works according to a MPEG AAC 
decoder for decoding and de-quantizing the core-layer bit- 
stream in the decoder 200. Specifically, the de-quantizer 202 
is used to generate the core-layer signal which is subsequently 
used for error mapping by the error mapping unit 203 in the 
decoder 200 to generate the enhancement-layer signal as will be 
described below. 
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However, it should be noted that de-quantizers according to 
other specifications such as MP3 or other proprietary decoders 
may be used in the decoder 200. 



In the LLE layer, an error mapping procedure is employed to 
remove the information that has already been coded in the core- 
layer bit-stream. A possible approach to build such an error 
mapping procedure is by subtracting the lower (closer to zero) 
10 quantization threshold of each quantized coefficient from the 
corresponding transformed input signal coefficient. 



This can be illustrated as: 
e (k) - c (k) - thr (k) , 
15 where the thr (k) is the lower (closer to zero) quantization 
threshold for c (k) , and e(k) is the error coefficient which 
represents the error signal. 

When the MPEG-4 AAC coder is used as the quantizer: 

/3 



20 thr(k) = sgn[c(k)] 



^scale _ factor(s) p. (k) _ Q> 4054]^ 



a 



In practical applications, to ensure robust reconstruction, a 
mapping from integer i (k) to integer thr (k) may be performed 
using a lookup table. As can be clearly seen from the above 
formula, a total of 4 tables are required for different values 
25 of scale__f actors (as the same table can be shared among 

different values scale_factors if they have a modulus 4 by bit- 
shifting) , wherein each table contains the mapping between all 
possible values of i (k) and corresponding thr (k) for any 
scale factor from the set of those with modulus 4. 
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It is also possible to perform the error mapping procedure by 
subtracting a reconstructed coefficient of the transformed 
input signal coefficient from the transformed signal 
5 coefficient as described in [3], which can be illustrated as: 

e (k) = c (k) - c (k) 

wherein c (k) is the reconstructed transformed signal 
10 coefficient. 

In general, it is also possible to perform the error mapping 
procedure based using: 

15 e (k) - c (k) - f (k) 

wherein f (k) is any function which corresponds to c (k) , such as 

f (k) - | (thr (k + 1) - thr (k)) . 

20 

Clearly, for c (k) that has already been significant in the core 
layer ( thr (k) * 0), the sign of the IntMDCT residual e (k) can 
be determined from the core layer reconstruction and hence only 
its amplitude is needed to be coded in the LLE layer. In 
25 addition, it is well known that for most audio signals, c (k) 
can be approximated by Laplacian random variables with the 
probability density function (pdf ) : 

f (cOO) = e-HWI^/to 2 , 



WO 2005/036528 



24 



PCT/SG2004/000323 



where o is the variance of c (k) . From the "memoryless" 
property of a Laplacian pdf it is easy to verify that the 
amplitude of e (k) is geometrically distributed as, 

f (|e(k)|) = {$ .9(^1, (1) 
5 where the distribution parameter 6 (k) is determined by the 

variance of c (k) and the step size of the core layer quantizer. 
This property enables a very efficient bit-plane coding scheme, 
such as the bit-plane Golomb code (BPGC) 0 for encoding the 
error signal to be applied. 

10 

In the decoder 200, the coefficients of the transformed signal 
may be reconstructed by the error mapping procedure performed 
by the error mapping unit 203 according to the following 
equation: 

15 

c (k) - e 1 (k) + thr (k) 

wherein e' (k) are the decoded error coefficients which describe 
the bit-plane decoded enhancement-layer bitstream, which 

20 corresponds to the error coefficients e(k) in the encoder 100. 
Hence it can be seen that with the transformed signal 
coefficients c(k) can be regenerated from the decoded error 
coefficients e' (k) (possible a lossy version if the LLE bit- 
stream is truncated to lower rates) and the quantization 

25 threshold thr(k) generated in the same manner in the encoder 
with the quantization index i(k) contained in the embedded 
core-layer (AAC) bitstream. 

Similar to the encoder 100, the transformed signal coefficients 
30 c(k) in the decoder 200 may also be generated using (adding) 



WO 2005/036528 



25 



PCT/SG2004/000323 



the decoded error coefficients e' (k) and reconstructed 
coefficients of the core-layer bitstream. Also, the transformed 
signal coefficients c(k) may be generated using (adding) the 
decoded error coefficients e' (k) and a function of c(k) . 

5 

To produce the scalable to lossless portion of the final 
embedded lossless bit-stream, the residual or error signal is 
further coded by the perceptual bit-plane coder 104 using bit- 
plane coding, an embedded coding technology that has widely 
10 adopted in audio coding [3] or image coding [5], in the LLE 
layer. 

A. description of a general bit-plane coding procedure can be 
found in [4] and [15] . Consider an input n-dimensional data 

15 vector x n - {xi, , x n } where xi is extracted from some random 

sources of some alphabet A C SR . Clearly, xi can be 
represented in a binary format 

-00 

Xi = (2s ± - 1) ■ J b i, j ' 21 ' 1 = lf • • • ' k 

j = 00 

20 

by cascading of binary bit-plane symbols that comprises of a 
sign symbol 

An x± ss 0 
Si = {o x ± < 0' 

25 

and amplitude symbols bi r j G {o, l} . In practice, the bit-plane 
coding could be started from the maximum bit-plane M of vector 
x n where M is an integer that satisfies 
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2 M ~ 1 <; max {|xi|} < 2 M f i - 1, . . . , k, 

and stopped at bit-plane 0 if x n is a integer vector. 

5 

The bit-plane coding and decoding process according to one 
embodiment of the invention and as for example performed by the 
perceptual bit-plane coder 104 and the perceptual bit-plane 
decoder 204 is explained in the following with reference to 
10 figure 3. 

Figure 3 illustrates a structure of the above bit-plane coding 
(BPC) process, where each input vector is first decomposed into 
the binary sign and amplitude symbols, which are then scanned, 

15 in a desired order, by a bit-plane scanning unit 301 and coded 
by an entropy coder 302 (e.g. as arithmetic code, as Huffman 
code or as run-length code) . In addition, a statistical model 
303, for example based on a Laplacian distribution of the input 
signal, is usually used to determine the probability assignment 

20 for each binary symbol to be coded. In the corresponding 
decoder, the data flaw is reversed, i.e. the output of the 
entropy encoder 302 is decoded by an entropy decoder 303 using 
a corresponding statistical model 304 and the result is used by 
a bit-plane reconstruction unit 304 to rebuild the bit-plane, 

25 where the sign and amplitude symbols which are decoded to 
rebuild the bit-plane of the data vector follows the same 
scanning order in the encoder. 

The most significant advantage of having a bit-plane coding 
30 system as above is the resulted compression bit-stream can be 
easily truncated to any desirable rates, where a reproduction 
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data vector x can still be obtained by a partial reconstructed 
bit-planes decoded from this truncated bit-stream. For best 
coding performance, an embedded principle (see [24] ) is usually 
adopted in BPC, according to which the bit-plane symbols are 
5 coded in the order of decreasing rate-distortion slope so that 
symbols with most signification contribution to the final 
distortion per unit rate are always coded first. 

The selection of the order of bit-plane scanning depends on the 
10 desired distortion measurement. When the mean square error (MSE) 
or the expectation on the square error function is used as the 
distortion measurements as shown: 

1 n 2 
d = - J ( x i ~ *i) 

i«l 

15 

wherein d (x n , x n ) is the distortion value, x n is the original 
data vector, and x n is the reconstructed vector of x n at the 
decoder. Results from [24] shows that the embedded principle is 
satisfied well by a sequential bit-plane scanning and coding 
20 procedure for most sources, except those with very skew bit- 
plane symbols distribution. 

An example of a simple sequential bit-plane scanning and coding 
procedure comprises the following steps: 

25 

1. Starting from the most significant bit-plane j =» M — 1 ; 

2. Encode only bi f j with bj^M-i = b i,M-2 = • . . = b i,j + l = 0 . If 
bi f j = 1 in the significance scan, encode s^ ; (significance 
pass) ; 
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3. Encode b^j that are not be encoded in the significance pass 

(Refinement pass); 

4. Progress to bit-plane j - 1. 

5 List 1. Bit-plane scanning & coding procedure 

The above procedure is iterated until certain terminating 
criterion, which is usually a pre-defined rate/distortion 
constraint, is reached. In addition, further adjustment of 
10 coding sequence in a significance pass may be required if bit- 
plane symbols are found to have unequal distributions. 

An example of the above sequential coding procedure is 
illustrated by considering a data vector x with dimension 4, 
15 say {9,-7,14,2}. So it is bit-plane coded from its most 

significant bit-plane 4. The significance pass is begun with 
since all elements are insignificant yet. (X denotes the bypass 
symbols). The sign is coded as follows: positive is coded as 1, 
and negative is coded as 0. 

20 



Data vector 


9 


-7 


14 


1 


1 st significant 
pass (sign) 


1 (sign: 
1) 


0 


1 (sign:l) 


0 


l 3t refinement 
pass 


X 


X 


X 


X 


2 na significant 
pass (sign) 


X 


1 (sign: 0) 


X 


0 


2 nrt refinement 
pass 


0 


X 


1 


X 


3 ra significant 
pass 


X 


X 


X 


0 
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3 rd refinement 
pass 


0 


1 


1 


X 


4 t& significant 
pass 


X 


X 


X 


1 (sign: 1) 


4 th refinement 
pass 


1 


1 


0 


X 



Thus the output binary stream is 11011010001001111110, which is 
then entropy coded and sent to the decoder. At the decoder, the 
bit-plane structure of the original data vector is 

5 reconstructed. If the entire binary stream is received by the 
decoder, the bit-plane of the original data vector can be 
restored and thus, a lossless reconstruction of the original 
data vector is obtained. If only a subset (most significant 
part) of the binary stream is received, the decoder still able 

10 to restore a partial bit-plane of the original data vector, so 
that a coarse reconstruction (quantized) version of the 
original data vector. 

The above is only a simple example of bit-plane scanning and 
15 coding procedure. In practice, the significant pass can be 

further fractionized to explore the statistical correlation of 
elements in the data vector, such as the bit-plane coding 
process in JPEG2000, or that in the embedded audio coder (EAC) 
described in [4] . 

20 

The above sequential bit-plane scanning and coding procedure 
only provides an effort to optimize the MSE performance. In 
area of audio, image or video coding, minimizing perceptual 
distortion instead of MSE is normally a more efficient coding 
25 method for obtaining optimal perceptual quality in 

reconstructed audio, image or video signal. Therefore, the 
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sequential bit-plane coding of the error signal is definitely a 
sub-optimal option. 

In the encoder 100, the error coefficients are preferably 
5 grouped into frequency bands so that each frequency band s 

contains a number of error coefficients in consecutive order. 
(The scale factor band grouping may be based on the band 
grouping adopted in the quantizer 102 if a perceptual coder is 
used as the quantizer 102. However other band grouping is also 
10 possible) . 

A frequency band s is said to be significant if there exist an 
error coefficient in the frequency band s such that the 
quantized coefficient thr(k) from the quantizer is not zero. In 
15 other words, if e (k) is an error coefficient in frequency band 
s: 

e (k) - c (k) - thr (k) , 

20 frequency band s is significant is thr (k) * 0 ( thr (k) = 0 when 
i (k) - 0), and hence e (k) = c(k), else frequency band s is 
considered insignificant. 

Perceptual significance of bits of the error coefficients can 
25 be determined by the level of Just Noticeable Distortion (JND) 
at a frequency location i. This level of JND, Ti, can be 
determined from a perceptual model such as psychoacoustic model 
(I or II) or any proprietary perceptual models. When a 
perceptual quantizer is used for forming the core-layer 
30 bitstream, the perceptual model used in the quantizer may also 
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be used to generate the JND for perceptual bit-plane coding of 
the error coefficients. 

For simplicity, perceptual significance of bits of the error 
coefficients in a same frequency band s can be set to the same 
value . 

In the following, a possible implementation of perceptual bit- 
plane coding is explained with reference to Fig. 4. 

Figure 4 shows an encoder 400 according to an embodiment of the 
invention. 

Analogously to the encoder 100 , the encoder 400 comprises a 
domain transformer 401 , a quantizer 402 f an error mapping unit 
403, a perceptual bit-plane coder 4 04 (using a perceptual model 
406) and a multiplexer 405. 

The perceptual BPC block, i.e., the perceptual bit-plane coder 
404 comprises a bit-plane shifting block 407 and a conventional 
BPC block 408. 

In the bit-plane shifting' block 407, the bit-planes are 
perceptually shifted, and the perceptually shifted bit-planes 
are coded in the BPC block 408 in a conventional sequential 
scanning and coding manner. 

Consider the following (modified) perceptually weighted 
distortion measurement 
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1 n 2 
d (in- En) J ( x i " x i) w i ( x i) ■ 



In the context of perceptual audio coding, the audio signal is 
usually quantized and coded in the frequency domain so that the 
5 data vector x n is the transformed audio signal and the 

weighting function w ± (xi) is the importance of x± at different 
frequency locations i, i.e., 



wi = ~ 

10 



Ti 



The above perceptually weighted distortion function may be re- 
written as follows: 

2 

«(,n,£n) = ijifcl - xi) 2 = ^|(^- 

-- 1 k - xi) 2 , 
n i=i 

15 

where 



, A 1 

xi = . — xi , i = 1, . . . , n . 



20 Hence the weighting square error function now becomes a square 

error function on the scaled vector x^ = {xi, . . . , x„} . Therefore, 
perceptually optimized coding of x n can be achieved by simply 
performing sequential bit-plane coding on Xn • In the 
corresponding decoder, each element of the bit-plane decoded 
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data vector x' can be scaled back to obtain a reconstructed 
data vector x as follows . 



Clearly, the weights Ti are preferably transmitted to the 
decoder as side information if they are unknown in the decoder. 

Wj is further quantized to an even integer power of 2 so that 
it becomes 



and the scaled data vector can thus be obtained by bit-shifting 
each element in the original data vector as follows 



which is easily obtained by performing right shifting operation 
on xi by x± . For example, if x ± - 00010011 and x± - -2 , the 
scaled data vector element xj_ is then 01001100; if Ti - 2 it 
will become 00000100.11. 

In this way, the bit-planes of the error coefficients are 
perceptually shifted in a manner such that when a sequential 




T ± = 2^1 , 



where 
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bit-plane coding is performed on the shifted bit-plane, bits 
which are more perceptually significant (instead of having the 
highest MSE) can be encoded first. 

Clearly, if each element in the original data vector is integer 
with limited word length, e.g., if each element in x has a 
maximum bit-plane of L, lossless coding of x can be achieved 
if every in the scaled vector is bit-plane coded from bit- 
plane -Ti - L - x± . 

As mentioned earlier, information on the perceptual 
significance such as the level of JND can be provided to the 
bit-plane shifting block from a perceptual model . 

In the bit-plane coding process, a maximum bit-plane, M(s), can 
be used to specify the starting bit-plane at which the bit- 
plane scanning and coding should start. The maximum bit-plane 
M(s) and Ti should preferably be transmitted as side 
information in the scalable bitstream to the corresponding 
decoder in order for the decoder to be able to decode the 
bitstream correctly. To reduce the amount of side information, 
M(s) and Tj_ may be constrained to the same value for the same 
scale factor band s in the encoder. 

The value of the maximum bit-plane M(s) in each frequency band 
s can be determined from the error coefficients e(k) using the 
following expression: 



2 M(s >- 1 S max (|e (k)|) < ^ {s) , k E s . 
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Furthermore, the maximum absolute value of the error 

coefficients max (|e (k)|) in each significant frequency band s 

are bounded by the quantizer interval of the perceptual 
quantizer: 

5 

max (|e (k)|) <; thr (i (k) + l) - thr (i (k)) . 

Therefore, this results in the maximum bit-plane M(s) for each 
significant frequency band s to be determined from the 
10 following expression: 

2 M(S) " 1 * max ((thr (i (k) + l) - |thr (i (k))||) < , k e s . 

Since the quantized coefficients of the perceptual quantizer 
15 i(k) is known to the decoder, it is thus not necessary to 

transmit the maximum bit-plane M(s) as side information to the 
decoder for the significant frequency bands s . 

The value of the maximum bit-plane M(s) may also be predefined 
20 in the encoder and decoder, and hence, need not be transmitted 
as the side information. 

Figure 5 shows a decoder 500 according to an embodiment of the 
invention. 

25 

The decoder 500 implements a perceptual bit-plane decoder 
which comprises bit-plane shifting and conventional (sequential) 
bit-plane coding. 
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Analogously to the decoder 200 the decoder 500 comprises a 
domain transformer 501 f a de-quantizer 502, an error mapping 
unit 503/ a perceptual bit-plane decoder 504 (using a 
perceptual model 506) and a de-multiplexer 505. 

5 

Similar to the perceptual bit-plane coder 404, the perceptual 
bit-plane decoder 504 comprises a bit-plane shifting block 507 
and a conventional BPC block 508. 

10 The enhancement-layer bitstream generated by the encoder 400 is 
bit-plane decoded by the decoder 500 in the consecutive 
sequential manner (same sequential bit-plane scanning procedure 
as the encoder 400) to reconstruct the bit-planes. The 
reconstructed bit-planes are shifted in the reverse manner of 

15 the encoder 400, based on the received or regenerated value x± , 
to generate the decoded error coefficients e' (k) which describe 
the bit-plane decoded enhancement-layer bitstream. 

Figure 6 shows an encoder 600 according to an embodiment of the 
20 invention. 

The encoder 600 uses perceptual bit-plane coding. 

The encoder 600 comprises a domain transformer (intMDCT) 601, a 
25 quantizer (ACC quantizer and coder) 602, an error mapping unit 
603, a perceptual significance calculation unit 604 (using a 
psychoacoustic model 605), a perceptually bit-plane coding unit 
606 and a multiplexer 607. 

30 In this implementation, the scanning order of the bit-planes 
and the bit-plane symbols need not be sequential, but based on 
perceptual importance of the bit-plane symbols corresponding to 
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different frequency bands. The perceptual importance of the 
bit-plane symbols is determined by calculating parameters 
related to the perceptual information, such as perceptual 
significance and first (maximum) bit-plane for bit-plane 
5 decoding. The calculation of the perceptual information 
parameters is represented as the perceptual significance 
calculation block, i.e., the perceptually bit-plane coding unit 
604. 

10 There are numerous ways to determine the perceptual importance, 
or specifically the perceptual significance, of the bit-plane 
symbols corresponding to different frequency bands. One widely 
adopted way is by using the psychoacoustics model, such as the 
Psychoacoustics Model 2 described in [19], of the input digital 

15 signal. The just noticeable distortion (JND) level T(s) for 

each frequency band determined using the psychoacoustic model 
can be converted to the unit of bit-plane level x (s) as follows: 

x (s) = | log 2 (T (s)) 

20 

However, this invention does not constrain the method on how 
T(s) or x{s) can be obtained. 

Now let Ps(s) represent the perceptual significance of 
25 frequency band s, which can be determined by the distance from 
M(s) to x (s) as, 



Ps{s) = M(s) - t(s) . 
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It can be further noted that the noise level, or the level of 
the IntMDCT error coefficients e(k) would tend to be flat with 
respect to the JND level for significant bands (as a result of 
the noise shaping mechanism in the core coder) . In other words, 
5 the value of Ps(s) would be very close, if not identical, for 
significant frequency bands. This fact can be explored in the 
method according to the invention by sharing a common factor 
Ps_common for all the significant bands. Possible selections of 
Ps__common can be the average value, the maximum value, the 
10 minimum value, or any other reasonable function of Ps(s) for 

all s that are significant. The Ps(s) can then be normalized as 
follows : 

Ps' (s) » Ps (s) - Ps _ common , 

15 

Since it is known that for significant band s, Ps' (s) would be 
zero, and therefore, need not be transmitted to the decoder. 
Otherwise, for insignificant band s, Ps' (s) should preferably be 
transmitted to the corresponding decoder as side information. 

20 

In some other examples when there is no significant band, 
Ps^common can be set to 0. 

It is also possible to use the noise shaping procedure in the 
25 core encoder to cater to the need for perceptual coding. Hence 
there is no need to further implement any noise shaping, or 
perceptual significant identification in the enhance layer. In 
such cases, Ps' (s) = 0 can be set for all s. Usually they do not 
need to be transmitted to the decoder if it is known by the 
30 decoder that they are all zero. 
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A possible implementation of the perceptual bit-plane coding 
mechanism can be described using the following pseudo code. 
Here the total number of the frequency band is denoted as 
s_total . 

5 

1. Find frequency band s with largest Ps'(s) 

2. Encode bit-plane symbols of bit-plane M(s) for e(k) in band s 

3. M(s) = M(s) -1; Ps'{s) = Ps'(s)-l 

4. if there exists band s for which M(s)^0 goto 1. 

10 

A method for obtaining the maximum bit-plane M(s) is described 
here. 

For significance band, M(s) can be determined from the maximum 
15 quantization interval of the quantizer if a perceptual 

quantizer such as an AAC quantizer is used. Specifically, M(s) 
is an integer that satisfies: 

2 M ( S) - 1 * max (|thr (i (k) + l) - |thr (i (k))||) < 2 M(s) , k S s . 

20 

In this case, M(s) does not need to be transmitted to the 
decoder as i(k) would be known to the decoder. 

For insignificant bands M(s) can be calculated from e(k) as 
25 follows : 



2 M(S) " 1 s max (|e (k)|) < , k G s , 
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and for those bands, M(s) should preferably be sent to the 
decoder as side information such an information is not 
contained in the core-layer bit-stream* 

5 The value of the maximum bit-plane M(s) may also be predefined 
in the encoder 600 and the corresponding decoder, and hence, 
need not be transmitted as the side information. 

Other alternative approaches to explore parameter Ps(s) in a 
10 bit-plane coding approach, towards some desired noise shaping 
goals, are also possible. In general, Ps(s) can also be 
obtained by any functions of M(s) 'and t(s) , for example the 
following: 

15 Ps(s) - M(s) - 2x(s) , or 

Ps(s) m w^ m 

2 

Figure 7 shows a decoder 700 according to an embodiment of the 
invention. 

20 

The decoder 700 is the corresponding decoder of the coder 600, 
wherein the perceptual bit-plane decoding is implemented using 
the perceptual bit-plane scanning procedure as described above. 

25 The decoder 700 accordingly comprises a domain transformer 
(reverse intMDCT) 701, a de-quantizer (ACC de-quantizer and 
decoder) 702, an error mapping unit 703, a perceptual 
significance calculation unit 704, a perceptually bit-plane 
decoding unit 706 and a de-multiplexer 707. 

30 
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In the decoder 700, for significant band, Ps' (s) is set to zero, 
and M(s) can be calculated from the AAC quantization index i(k) 
in the same manner in the encoder, i.e.: 

5 2 M < S >~ 1 * max flthr (i (k) + l) - |thr (i (k))||) < 2 M < s) , k e s . 

For insignificant band, Ps(s) and M(s) can simply be recovered 
from the transmitted side information. Once Ps(s) and M(s) is 
recovered for all frequency bands, the IntMDCT error 
10 coefficients e (k) can be easily reconstructed by decoding the 
received bit-stream and reconstruct its bit-plane symbols in a 
order that is exact the same as that in the encoder 700. For 
example, the decoding process for the encoding example given 
above would be: 

15 

1 . Find frequency band s with largest Ps' (s) 

2. Decode bit-plane symbols of bit-plane M(s) for e (k) in band 

3. M(s) = M(s)-1; Ps' (s) - Ps' (s) - 1 

4. If there exists band s for which M(s) s 0 goto 1. 

20 

Determining the maximum bit-plane for bit plane coding of error 
coefficients. 

For a significant band s (i.e., the error coefficient 
25 e (k) * c(k) or 3k EE s, i(k) * 0) , the maximum absolute value of 
e(k) is bounded by the quantizer interval in the AAC quantizer 
as : 



max (|e(k)|) s thr (i (k) + l) - thr (i (k)) . 
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Therefore, the maximum bit-plane M(k) can be determined using: 

2 M ^- 1 * max Qthr (i (k) + l) - |thr (i (k))||) < 2^, k E s . 

As i(k) is already known by the decoder, M(k) need not be 
transmitted to the decoder since the decoder is able to 
regenerate thr(k) and hence, M(k) from i(k) for significant 
band s. 

For insignificant band, M(k) can be calculated from e(k) as 
follows : 

2 M ^" 1 * max(|e(k)|) < 2^ , k € s , 

and the calculated M(s) is preferably transmitted with the 
enhancement- layer bits tr earn as side information for the 
enhancement- layer bitstream to be bit-plane decoded correctly. 

To reduce the amount of side information, M(k) can be further 
constrained to have the same values for k for the same scale 
factor band s in the core-layer quantizer. Therefore, M(k) may 
be denoted also as M(s) - 

In the decoder 700, the error coefficients corresponding to the 
error signal can be reconstructed by bit-plane decoding of the 
enhancement- layer bitstream using the same bit-plane scanning 
procedure as the encoder based on M(s) . For significant band, 
M(s) can be regenerated using the following: 
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Z^- 1 * max (|thr (i (k) + l) - |thr (i (k))||) < 2 M(k) , k G s . 

For insignificant band, the decoder makes use of the M(s) which 
is transmitted by the encoder as side information. 
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